In this Issue:

The Rockley Report Current Issue Home Page

Gaining Management Support

XML and DTDs: The Buy vs Build Argument

Steve Manning, The Rockley Group

When deciding to adopt XML as an authoring standard/backbone, one has to consider the question, "Do we need to create our own DTD?" Some will tell you that there are ready-made DTDs out there for you to grab and use. Others will tell you that you must either start from scratch or forget about it. So what's the answer? Here are the pluses and minuses in the Buy vs. Build debate to help you decide what makes sense for you and get the DTD you need.

It's a common question for anyone contemplating a move to XML: "Do we need to create our own DTD?" The answer, guaranteed to frustrate you is, "Maybe ... maybe not."

You have 4 choices:

  • Build your own DTD from scratch
  • Adopt an existing (possibly industry-standard) DTD as is
  • Modify an existing DTD
  • Create your own DTD as a layer on top of an existing/industry standard DTD

What's available

Let's start by discussing what's available. DTD is used here to represent any predefined, "validatable" structure. That is, the structure could be a DTD (XML or SGML), or it could be a schema (the XML-based equivalent to the DTD). Both types of structures are readily available for use.

Note that you don't necessarily have to buy a DTD to use it. There are many DTDs that are available as open source DTDs and schemas, available for you to use for free. To get an idea of just how many, surf the Applications section of the Cover Pages (xml.coverpages.org) to see a rough list. Or, go to xml.schemas.org and see what's available there.

For technical writers, a couple of the noteworthy structures are the Darwin Information Typing Architecture (DITA) and DocBook.

DITA

DITA gives you the building blocks for creating your own topic-based markup and is described as follows:

The Darwin Information Typing Architecture (DITA) is an XML-based, end-to-end architecture for authoring, producing, and delivering technical information. This architecture consists of a set of design principles for creating "information-typed" modules at a topic level and for using that content in delivery modes such as online help and product support portals on the Web. [1]

DocBook

DocBook works out of the box and it comes in a couple of different forms (full and simplified) and with a suite of stylesheets. DocBook is defined as follows:

DocBook is a DTD maintained by the DocBook Technical Committee of OASIS. It is particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). [2]

It has to be "production worthy"

Beyond knowing what's available, you also need to consider what makes an XML implementation production worthy, regardless of the tool. Too often, companies focus on the content model defined in a DTD, or the availability of ready-made stylesheets, and forget about the environment in which the DTD will be used. Some of the factors that can affect XML implementation are described below.

Exposing the XML

For publications, XML authoring begins in the authoring tool. A typical authoring scenario goes something like this: users sit at their computers, select File > New, and choose a type of document to create. The list of document types corresponds to the list of DTDs (Document Type Definitions) that the authoring tool knows about. (The details might vary, but this is the basic procedure for creating a new document in pretty much all XML editors.)

The operative word in this scenario is "users" because defining production-worthy always begins with users. Typical users are a little nervous about technology. They've heard about XML, but really don't understand it. And, in a good XML implementation, they shouldn't need to know much about it. That's the first rule of production worthy: hide as much of the XML from users (in this case, authors) as possible-it scares them! Users need to understand structure and structured authoring. They also need to understand the concepts of attributes and metadata. But they don't need to understand things like the coding rules of XML. That should be hidden. Expose only as much XML as possible.

Reducing the learning curve

Hiding the complexities of XML from users removes "XML Training" from the list of courses that people usually think are required for moving to XML. XML training is required, but not for all users. Hiding the XML is one way of reducing the learning curve. There's also a second way: provide users with a DTD where the element names are meaningful to them.

One of the great benefits of building a DTD is that you get to create the tag names. XML itself is not a markup language, but a standard for creating markup languages. If you are creating your DTD from scratch, you get to make up all of the tag names. That's an advantage, because you get to give your structural elements names that have meaning to you and your authors. That's the second way of reducing the learning curve. Your tag names will fit into the natural language of your authors, making your markup easy to use and author in.

On the other hand, buying a DTD brings the risk of imposing a new language on your authors. You call it a "caution", the other markup calls it an "alert." This can lead to confusion and inefficiency and will definitely add to the learning curve.

Supporting information exchange

One of the arguments people use when promoting industry-standard DTDs is that they come with the ability to improve information exchange. This can be a very persuasive argument to adopt an industry-standard or existing DTD. But, so what if you have to exchange information? You already do. You exchange it with users. Okay, I'm being obtuse; the point of information exchange is to share the source so it can be reused by others. But, you needn't approach it any differently than delivering information to users.

There are two phases of information development when using a markup language-the authoring phase and the delivery phase. Information exchange is just another delivery output. With XML, you have the opportunity to improve the efficiency of authoring and the efficiency of delivery. So why not do both? Make the authoring version as effortless to use as possible, and transform it (using XSL stylesheets) into as many output (delivery) languages as you need. You can take a custom (modified) DTD and transform it to a standard DTD in order to exchange information. Now that is not to say that an industry-standard DTD will never match the language/terminology of your authors. It might. If it does, you've got the best of both worlds: efficient authoring and easy source sharing.

The bottom line is, to be considered production-worthy, an XML implementation must be optimized for both authoring and for delivery.

Supporting multiple outputs

The buy vs. build decision can also extend to stylesheets. XML requires stylesheets for output. You associate XML with a file, pass it through an output generator, and get the appropriate output format (e.g., HTML, PDF, or other XML markup). Generating multiple formats requires multiple stylesheets and possibly multiple output generators. For authors or publishers, this is not necessarily a big deal. It's technically not really any more complicated than associating a Word document with a specific template.

However, the complexity is for the individuals creating the stylesheets. The more complex the output, the more complex the stylesheet. The more complex the list of outputs, the more complex the maintenance of those stylesheets will be. How many stylesheets might be effected by a style change? Can they be modularized to share common style properties?

Here is where there are some advantages to the bought or borrowed DTDs. They usually come with stylesheets that are, at the very least, excellent starting points that you can modify for your own use. Some, like DocBook, come with a suite of stylesheets for things like HTML pages, Web Sites, HTML Help, and PDF. Building upon an existing stylesheet can really shorten the implementation process.

Ranking the choices

So how do you choose the best approach? The choices are:

  • Build your own DTD from scratch
  • Adopt an existing (possibly industry-standard) DTD as is
  • Modify an existing DTD
  • Create your own DTD as a layer on top of an existing/industry standard DTD

Build your own DTD from scratch

This is the most time-consuming approach, but it also has the most potential for getting exactly what you want from a DTD.

Pros

  • You get exactly what you want
  • Greatest opportunity to improve both the authoring and delivery sides of content
  • Shortest learning curve
  • Easiest to get buy-in for from authors

Cons

  • Time consuming
  • Technically demanding

Adopt an existing (possibly industry-standard) DTD as is

The fastest approach to implementation, but you risk limiting the effectiveness.

Pros

  • Shortest to implement
  • Facilitates source sharing

Cons

  • Long learning curve where the language of the markup tags is not natural to the users
  • Could be more tags or fewer tags than you really need
  • Stylesheets are designed for someone else's styles, if they exist

Modify an existing DTD

Faster than starting from scratch, but may have issues for long-term maintenance.

Pros

  • Takes less time than starting from scratch
  • Can build on existing stylesheets
  • Can add tags when tags are missing or delete tags when not needed

Cons

  • Long learning curve if the language of the markup tags is not natural to the users
  • Modifying can be technically demanding
  • Can be difficult to maintain when the source DTD changes

Create your own DTD as a layer on top of an existing/industry standard DTD

This requires that you author in a tag set of your own making, then transform the markup into that of an existing DTD/standard.

Pros

  • You can create a markup set that meets the users' needs exactly
  • You can still take advantages of stylesheets supplied with the DTD
  • You can optimize for both authoring and delivery

Cons

  • Still requires technical expertise to modify the stylesheets to match your style

Summary

There are advantages and disadvantages to both starting a new DTD from scratch and using or modifying an existing DTD. When starting with an existing DTD, you can build upon the information models represented by the DTD, as well as take advantage of stylesheets that frequently support the DTDs. The disadvantage is that you have to adapt your authoring to someone else's vision of the content. Starting from scratch means that the content models match your work, but the effort to create everything-the DTD and all the stylesheets-may be prohibitive.

References

[1] Description of DITA is available at http://www-106.ibm.com/developerworks/xml/library/x-dita1/

[2] Description of DocBook is available at www.docbook.org

Copyright 2004, The Rockley Group, Inc.